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Abstract 

Let A be an arbitrary matrix and let A be a slight random perturbation of A. We prove 
that it is unlikely that A has large condition number. Using this result, we prove it is unlikely 
that A has large growth factor under Gaussian elimination without pivoting. By combining 
these results, we show that the smoothed precision necessary to solve Ax = b, for any b, 
using Gaussian elimination without pivoting is logarithmic. Moreover, when A is an all-zero 
square matrix, our results significantly improve the average-case analysis of Gaussian elimination 
without pivoting performed by Yeung and Chan (SIAM J. Matrix Anal. Appl., 1997). 
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1 Introduction 



Spielman and Teng ST04J, introduced the smoothed analysis of algorithms to explain the success 
of algorithms and heuristics that could not be well understood through traditional worst-case and 
average-case analyses. Smoothed analysis is a hybrid of worst-case and average-case analyses in 
which one measures the maximum over inputs of the expected value of a measure of the perfor- 
mance of an algorithm on slight random perturbations of that input. For example, the smoothed 
complexity of an algorithm is the maximum over its inputs of the expected running time of the 
algorithm under slight perturbations of that input. If an algorithm has low smoothed complexity 
and its inputs are subject to noise, then it is unlikely that one will encounter an input on which 
the algorithm performs poorly. (See also the Smoothed Analysis Homepage |Smoj ) 

Smoothed analysis is motivated by the existence of algorithms and heuristics that are known 
to work well in practice, but which are known to have poor worst-case performance. Average-case 
analysis was introduced in an attempt to explain the success of such heuristics. However, average- 
case analyses are often unsatisfying as the random inputs they consider may bare little resemblance 
to the inputs actually encountered in practice. Smoothed analysis attempts to overcome this 
objection by proving a bound that holds in every neighborhood of inputs. 

In this paper, we prove that perturbations of arbitrary matrices are unlikely to have large 
condition numbers or large growth factors under Gaussian Elimination without pivoting. As a 
consequence, we conclude that the smoothed precision necessary for Gaussian elimination is log- 
arithmic. We obtain similar results for perturbations that affect only the non-zero and diagonal 
entries of symmetric matrices. We hope that these results will be a first step toward a smoothed 
analysis of Gaussian elimination with partial pivoting — an algorithm that is widely used in practice 
but known to have poor worst-case performance. 

In the rest of this section, we recall the definitions of the condition numbers and growth factors of 
matrices, and review prior work on their average-case analysis. In Section|21 we perform a smoothed 
analysis of the condition number of a matrix. In Section|lJ we use the results of Section|2]to obtain a 
smoothed analysis of the growth factors of Gaussian elimination without pivoting. In Section |S1 we 
combine these results to obtain a smoothed bound on the precision needed by Gaussian elimination 
without pivoting. Definitions of zero-preserving perturbations and our results on perturbations that 
only affect the non-zero and diagonal entries of symmetric matrices appear in Section H3 In the 
conclusion section, we explain how our results may be extended to larger families of perturbations, 
present some counter-examples, and suggest future directions for research. Other conjectures and 
open questions appear in the body of the paper. 

The analysis in this paper requires many results from probability. Where reasonable, these have 
been deferred to the appendix. 
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1.1 Condition numbers and growth factors 



We use the standard notation for the 1, 2 and oo-norms of matrices and column vectors, and define 

||A|| max = max | At j | . 



Definition 1.1 (Condition Number). For a square matrix A, the condition number of A is 
defined by 

k(A) = ||A|| 



A" 1 



The condition number measures how much the solution to a system Ax = b changes as one 
makes slight changes to A and b. A consequence is that if ones solves the linear system using 
fewer than log(i<(A)) bits of precision, one is likely to obtain a result far from a solution. For more 
information on the condition number of a matrix, we refer the reader to one of |GL83llTB97l[Oem97| . 

The simplest and most often implemented method of solving linear systems is Gaussian elimina- 
tion. Natural implementations of Gaussian elimination use (n 3 ) arithmetic operations to solve a 
system of n linear equations in n variables. If the coefficients of these equations are specified using 
b bits, in the worst case it suffices to perform the elimination using O(bn) bits of precision [GTS9T] . 
This high precision may be necessary because the elimination may produce large intermediate en- 
tries |TB97j . However, in practice one usually obtains accurate answers using much less precision. 
In fact, it is rare to find an implementation of Gaussian elimination that uses anything more than 
double precision, and high-precision solvers are rarely used or needed in practice |TB971 |TS90 
(for example, LAPACK uses 64 bits ABB + 99 ). One of the main results of this paper is that 
0(b + logn) bits of precision usually suffice for Gaussian elimination in the smoothed analysis 
framework. 

Since Wilkinson's seminal work Wil61 , it has been understood that it suffices to carry out 
Gaussian elimination with b + log 2 (5uK(A) IIU-H^ / HA^ + 3) bits of accuracy to obtain a 

solution that is accurate to b bits. In this formula, L and U are the LU-decomposition of A; that 
is, U is the upper-triangular matrix and L is the lower-triangular matrix with Is on the diagonal 
for which A = LU. 



1.2 Prior work 

The average-case behaviors of the condition numbers and growth factors of matrices have been 
studied both analytically and experimentally. In his paper, "The probability that a numerical 
analysis problem is difficult" , Demmel Dem88 proved that it is unlikely that a Gaussian random 
matrix centered at the origin has large condition number. Demmel's bounds on the condition 
number were improved by Edelman Ede88 . 

Average-case analysis of growth factors began with the experimental work of Trefethen and 
Schreiber [TS90j . who found that Gaussian random matrices rarely have large growth factors under 
partial or full pivoting. 
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Definition 1.2 (Gaussian Matrix). A matrix G is a Gaussian random matrix of variance a 2 if 
each entry of G is an independent univariate Gaussian variable with mean and standard deviation 
cr. 

Yeung and Chan |YC97| study the growth factors of Gaussian elimination without pivoting on 
Gaussian random matrices of variance 1 . They define pu and pj_ by 

p u (A) = IIUII^/HAII^ , and 
Pl(A) = IILI^, 

where A = LLL is the LU-factorization of A obtained without pivoting. They prove 

Theorem 1.3 (Yeung-Chan). There exist constants c > and < b < 1 such that if G is 
an n x n Gaussian random matrix of variance 1 and G = LU is the LU-factorization of G, then 

cu 3 



Pr[p L (G)>x] < 

x 




cn 5 /2 



Pr[p u (G)>x] < min , - + + b n 



As it is generally believed that partial pivoting is better than no pivoting, their result pro- 
vides some intuition for the experimental results of Trefethen and Schreiber demonstrating that 
random matrices rarely have large growth factors under partial pivoting. However, we note that 
it is difficult to make this intuition rigorous as there are matrices A for which no pivoting has 
||L|| ma x||Ll|| m ax/|| A|| max = 2 while partial pivoting has growth factor 2 n . (See also |Hig 90 ) 

The running times of many numerical algorithms depend on the condition numbers of their 
inputs. For example, the number of iterations taken by the method of conjugate gradients can 
be bounded in terms of the square root of the condition number. Similarly, the running times of 
interior-point methods can be bounded in terms of condition numbers jRen95| . Blum [Blu89 sug- 
gested that a complexity theory of numerical algorithms should be parameterized by the condition 
number of an input in addition to the input size. Smale |Sma97j proposed a complexity theory of 
numerical algorithms in which one: 

1. proves a bound on the running time of an algorithm solving a problem in terms of its condition 
number, and then 

2. proves that it is unlikely that a random problem instance has large condition number. 
This program is analogous to the average-case complexity of Theoretical Computer Science. 
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1.3 Our results 

To better model the inputs that occur in practice, we propose replacing step 2 of Smale's program 
with 

2'. prove that for every input instance it is unlikely that a slight random perturbation of that 
instance has large condition number. 

That is, we propose to bound the smoothed value of the condition number. Our first result in 
this program is presented in Section where we improve upon Demmel's Dcm88 and Edel- 
man's |Ede88j average-case results to show that a slight Gaussian perturbation of an arbitrary 
matrix is unlikely to have large condition number. 

Definition 1.4 (Gaussian Perturbation). Let A be an arbitrary u x u matrix. The matrix A 
is a Gaussian perturbation of A of variance a 2 if A can be written as A = A + G, where G is a 
Gaussian random matrix of variance cr 2 . We also refer to A as a Gaussian matrix of variance a 2 
centered at A. 

In our smoothed analysis of the condition number, we consider an arbitrary u x u matrix A 
of norm at most y/ri, and we bound the probability that k(A + G), the condition number of its 
Gaussian perturbation, is large, where G is a Gaussian random matrix of variance o 2 < 1 . We 
bound this probability in terms of a and u. In contrast with the average-case analysis of Demmel 
and Edelman, our analysis can be interpreted as demonstrating that if there is a little bit of 
imprecision or noise in the entries of a matrix, then it is unlikely it is ill-conditioned. On the other 
hand, Edelman |Ede92j writes of random matrices: 

What is a mistake is to psychologically link a random matrix with the intuitive 
notion of a "typical" matrix or the vague concept of "any old matrix." 

The reader might also be interested in recent work on the smoothed analysis of the condition 
numbers of linear programs I >l)()2 I )ST02l IST03| . 

In Section HJ we use results from Section |21 to perform a smoothed analysis of the growth factors 
of Gaussian elimination without pivoting. If one specializes our results to perturbations of an all- 
zero square matrix, then one obtains a bound on pu that improves the bound obtained by Yeung 
and Chan by a factor of u and which agrees with their experimental observations. The result 
obtained for pj_ also improves the bound of Yeung and Chan YC97 by a factor of u. However, 
while Yeung and Chan compute the density functions of the distribution of the elements in L and 
U, such precise estimates are not immediately available in our model. As a result, the techniques 
we develop are applicable to a wide variety of models of perturbations beyond the Gaussian. For 
example, one could use our techniques to obtain results of a similar nature if G were a matrix of 
random variables chosen uniformly in [—1,1]. We comment further upon this in the conclusions 
section of the paper. 
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The less effect a perturbation has, the more meaningful the results of smoothed analysis are. 
As many matrices encountered in practice are sparse or have structure, it would be best to consider 
perturbations that respect their sparsity pattern or structure. Our first result in this direction 
appears in Section H3 in which we consider the condition numbers and growth factors of perturba- 
tions of symmetric matrices that only alter their non-zero and diagonal elements. We prove results 
similar to those proved for dense perturbations of arbitrary matrices. 

2 Notation and Mathematical Preliminaries 

We use bold lower-case Roman letters such as x, a, bj to denote vectors in R ? . Whenever a vector, 
say a £ M n is present, its components will be denoted by lower-case Roman letters with subscripts, 
such as ai , . . . , a n . Matrices are denoted by bold upper-case Roman letters such as A and scalars 
are denoted by lower-case roman letters. Indicator random variables and random event variables 
are denoted by upper-case Roman letters. Random variables taking real values are denoted by 
upper-case Roman letters, except when they are components of a random vector or matrix. 

The probability of an event A is written Pr [A], and the expectation of a variable X is written 
E [X]. The indicator random variable for an event A is written [A]. 

We write In to denote the natural logarithm, base e, and explicitly write the base for all other 
logarithms. 

For integers a < b, we let a : b denote the set of integers {x: a < x < b}. For a matrix A we let 
A a: b, C :d denote the submatrix of A indexed by rows in a : b and columns in c : d. 
We will bound many probabilities by applying the following proposition. 

Proposition 2.1 (Minimum < Average < Maximum). Let u.(X,Y) be a non-negative inte- 
grate function, and let X and Y be random variables distributed according to u.(X, Y). If A(X,Y) 
is an event and F(X,Y) is a function, then 

min Pr [A(X, Y)] < Pr [A(X, Y)] < max Pr [A(X, Y)] , and 

X Y X,Y X Y 

min E Y [F(X, Y)] < E X)Y [F(X, Y)] < max E Y [F(X, Y)] , 

X X 

where in the left-hand and right-hand terms, Y is distributed according to the induced distribution 

onu(X,Y). 

We recall that a matrix Q is an orthonormal matrix if its inverse is equal to its transpose, that 
is, Q T Q = I. In Section we will use the following proposition. 

Proposition 2.2 (Orthonormal Transformation of Gaussian). Let A be a matrix in IR nXTl 
and Q be an orthonormal matrix in M nXTL . If A is a Gaussian perturbation of A of variance a 2 , 
then QA is a Gaussian perturbation of QA of variance a 2 . 

We will also use the following extension of Proposition 2.17 of |ST04| . 
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Proposition 2.3 (Gaussian Measure of Halfspaces). Let t be any unit vector in R n and r be 
any real. Let b be a vector in R n and b be a Gaussian perturbation of b of variance cr 2 . Then 



Pr 




t T b 


< r 


1 


b 








\/2na . 



rt=r 



t=-r 



dt. 



In this paper we will use the following properties of matrix norms and vector norms. 

Proposition 2.4 (Product). For any pair of matrices A and B such that AB is denned, and for 

every 1 < p < oo, 

||AB|| p < ||A|| p ||B|| p . 

Proposition 2.5 (Vector Norms). For any column vector a in M n , || a|| -| /y/n < ||a|| 2 < ||a||i- 
Proposition 2.6 (2-norm). For any matrix A, 



as both are equal to the largest eigenvalue of VA J A. 

Proposition 2.7 (HAH^: the maximum absolute row sum norm). For every matrix A, 

ll A lloo =max 
i 

where ai , . . . , a n are the rows of A. Thus, for any submatrix D of A, 

IIDIloo < Woo • 

Proposition 2.8 ( ||/V|| : the maximum absolute column sum norm). For every matrix A, 

||A||-, = max || dill i , 

i 

where di , . . . , a n are the columns of A. Thus 



3 Smoothed analysis of the condition number of a matrix 

In this section, we will prove the following theorem which shows that for every matrix it is unlikely 
that a slight perturbation of that matrix has large condition number. 
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Theorem 3.1 (Smoothed Analysis of Condition number). Let A be an n x n matrix satis- 
fying II AIL < y/K, and let A be a Gaussian perturbation of A of variance a 2 < 1 . Then, Vx > 1 , 



Pr [k(A) > x] < 



14.1n(1 + y/2\n{x)/9n 



xa 



As bounds on the norm of a random matrix are standard, we focus on the norm of the inverse. 
Recall that 1/ j |>V 1 1| 2 = min x ||Ax|| 2 / ||x[| 2 . 

The first step in the proof is to bound the probability that ||A — v|L is small for a fixed unit 
vector v. This result is also used later (in Section l4.1j) in studying the growth factor. Using this 
result and an averaging argument, we then bound the probability that ||A _1 || 2 is large. 

Lemma 3.2 (Projection of A -1 ). Let A be an arbitrary square matrix in M nXTL , and let A be 
a Gaussian perturbation of A of variance cr 2 . Let v be an arbitrary unit vector. Then 

2J_ 

7TXO" 

:1 = v. Let B = QA and B = QA. By 

Proposition I2.21 B is a Gaussian perturbation of B of variance a 2 . We have 



Pr 




A _1 v 


> X 


< 








2 





Proof. Let Q be an orthonormal matrix such that Q T ei 



A 1 Q T e 1 



IQA)- 1 



Thus, to prove the lemma it is sufficient to show 

B 1 ei 



Pr 

B 



> X 



< 



2J_ 

7TXff" 



We observe that 



B ei 



B 



the length of the first column of B _1 . The first column of B _1 , by the definition of the matrix inverse, 
is the vector that is orthogonal to every row of B but the first and that has inner product 1 with 
the first row of B. Hence its length is the reciprocal of the length of the projection of the first row 
of B onto the subspace orthogonal to the rest of the rows. 

Let bi , . . . , b n be the rows of B and bi , . . . , b n be the rows of B. Note that b| is a Gaussian 
perturbation of b\ of variance a 2 . Let t be the unit vector that is orthogonal to the span of 
t>2, • • • , b n . Then 



(B 



1 

Fb7 
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Thus, 



Pr 

B 



B v 



> x 



Pr 

t>l ,...,b n 



1 



< max Pr 

b 2) ...,b n b, L 



t T b, 
t T b! 



> x 



< 1/x 



< 



2 1 



7T XO" 



where the first inequality follows from Proposition ^. H and the second inequality follows from Lemma 

roi □ 

Theorem 3.3 (Smallest singular value). Let A be an arbitrary square matrix in M nxn , and 
let A be a Gaussian perturbation of A of variance a 2 . Then 



Pr 

A 





A" 1 


> X 






2 J 



< 2.35 



n 



xcr 



Proof. Let v be a uniformly distributed random unit vector in M n . It follows from Lemma 13. 21 that 



Pr 




A _1 v 


> X 


< 


A,v 






2 





2 1 

7TXO" 



(3.1) 



Since A is a Gaussian perturbation of A, with probability 1 there is a unique pair (u, — u) of 
unit vectors such that ||A _1 u|| 2 = ||A~ 1 || 2 . From the inequality 



we know that for every c > 0, 



A^v 


> 


A" 1 




u T v 




2 




2 





Pr 

A,v 



A~'v 



n 



> Pr 

A,v 



Pr 

A,v 



> X 



= Pr 




A" 1 




> 


X 


A 






2 






> Pr 




A" 1 




> 


X 


A 






2 






> Pr 




A" 1 




> 


X 


A 






2 







and 


u T v 


> 


Pr 




u T v 


> 


J A,v 








Pr 




u T v 


> 


A,v 









A 



A 



-i 



> x 



> x 



min Pr 

A||A- 1 || 2 >x v L 

Pr[|G|> VE], 

G 



> Vc7 



u 



(by Proposition I2.1|) 
(by Lemma |B,1|) 



where G is a Gaussian random variable with mean and variance 1 . To prove this last inequality, 
we first note that that v is a random unit vector and is independent from u. Thus, in a basis of 
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in which u is the first vector, v is a uniformly distributed random unit vector with the first 



coordinate equal to u T v, and so we may apply Lemma IB.ll to bound Pr v | uTv | > \J c/ti 
below by Pr G [|G[ > y/c\. So, 



from 



Pr 




A" 1 


> X 


< 


A 






2 





Pr A ,v 



|a > xa/c/ 



n 



< 



Pr G [|G| > y/c\ 
1 Jri 



7txa^cPr G [|G| > y/c\ 



(by (E3J) 



Because this inequality is true for every c, we will choose a value for c that almost maximizes 
t/cPig [|G| > y/c\ and which in turn almost minimizes the right hand side. 

Choosing c = 0.57, and evaluating the error function numerically, we determine 



□ 



Pr 




A" 1 


> X 


A 






2 J 



<2.35^. 
xcr 



Note that Theorem 13.31 gives a smoothed analogue of the following bound of Edelman Ede88 
on Gaussian random matrices. 



Theorem 3.4 (Edelman). Let G G 



be a Gaussian random matrix with variance a 2 , then 



Pr 




G 1 


> X 


< 


G 






2 





Xff ' 



As Gaussian random matrices can be viewed as Gaussian random perturbations of the n x n 
all-zero square matrix, Theorem 13.31 extends Edelman's theorem to Gaussian random perturbations 
of an arbitrary matrix. The constant 2.35 in Theorem l3.3l is bigger than Edelman's 1 for Gaussian 
random matrices. We conjecture that it is possible to reduce 2.35 in Theorem 13.31 to 1 as well. 



Conjecture 1 (Smallest Singular Value). Let A be an arbitrary square matrix in 
let A be a Gaussian perturbation of A of variance a 2 . Then 



\ and 



Pr 




A" 1 


> X 


< 


A 






2 





xcr 



We now apply Theorem 13.31 to prove Theorem 13. II 



9 



Proof of Theorem VJ . 1\ As observed by Davidson and Szarek |DS011 Theorem II. 7], one can apply 
inequality (1.4) of |LT91| to show that for all k > 0, 

Pr [|| A - A|| 2 > a (2Vn + k)] < e~ k2/2 . 

Replacing o" by its upper bound of 1 and setting e = e~ k2 / 2 , we obtain 

< e, 



Pr 

A 



|A- A|| 2 > 2^+ v^lnH/e] 
for all e < 1. By assumption, ||A|| 2 < a/u; so, 



Pr 

A 



|A|| 2 > 3a/u + ^Iln^/e) 
From the result of Theorem 13,31 we have 



< e. 



Pr 

A 



> 2.35/ri 
2~ ea 



< e. 



Combining these two bounds, we find 



Pr 

A 



> 



7.05u + 2.35 v / 2nln(T7e) 



ea 



< 2e. 



So that we can express this probability in the form of PrA [||A||2 || A L > x] , for x > 1 , we let 



7.05u + 2.35^2nln(l/e) 
ea 



(3.2) 



It follows Equation (|3.2() and the assumption a < 1 that xe > 1, implying ln(1/e) < lnx. From 
Equation (|3.2|) . we derive 



2e 



2(7.05n + 2.35y2nln(1/e)) 2 (7.05n + 2.35\/2nlnx) 14.1n(l + ^2Mx)/9n 



< 



< 



Xff 



xa 



xa 



Therefore, we conclude 



Pr 



> x 



< 



14.1n 1 + ^2 In (x)/9n 



xa 



□ 



We conjecture that the 1 + ln(x)/9n term should be unnecessary because those matrices for 



which ||A||2 is large are less likely to have ||A L large as well. 
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4 Growth Factor of Gaussian Elimination without Pivoting 



We now turn to proving a bound on the growth factor. We will consider a matrix A G M nx n obtained 
from a Gaussian perturbation of variance cr 2 of an arbitrary matrix A satisfying ||A|L < 1. With 
probability 1 , none of the diagonal entries that occur during elimination will be 0. So, in the 
spirit of Yeung and Chan YC97], we analyze the growth factor of Gaussian elimination without 
pivoting. When we specialize our smoothed analyses to the case A = 0, we improve the bounds of 
Yeung and Chan (see Theorem I l.MJ) by a factor of u. Our improved bound on pu agrees with their 
experimental analyses. 



4.1 Growth in U 

We recall that 

Pu(A) = S°. 

Halloo 

In this section, we give two bounds on pu(A). The first will have a better dependence on a, and 
second will have a better dependence on u. It is the later bound, Theorem 14.31 that agrees with 
the experiments of Yeung and Chan |YC97j when specialized to the average-case by setting A = 
and a = 1 . 



4.1.1 First bound 

Theorem 4.1 (First bound on pu(A)). Let A be an n x u matrix satisfying ||A|| 2 < 1, and let 
A be a Gaussian perturbation of A of variance a 2 < 1 . Then, 

^ r r » a i i 1 TL ( rl + 1 ) 

Pr [pu(A) > 1 + x] < 



Proof. By Proposition 12.71 

fA , l|U|L ||(Ui,:) T ||i 

Pu(A) = = max 111 . 

Halloo x Halloo 

So, we need to bound the probability that the 1-norm of the vector defined by each row of U is 
large and then apply a union bound to bound the overall probability. 

Fix for now a k between 2 and u. We denote the upper triangular segment of the kth row of U 
by U T = Ulrica) and observe that u can be obtained from the formula: 

u T = a T -b T C" 1 D (4.1) 

where 

a T =Akk:n b =Avi:k-1 C = Ai : ic_i i D = Ai ; k_i k :n - 
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This expression for u follows immediately from 



A 1:k>: = 

From (|4.1|) . we derive 



C D 

b T a T 



-1:k-1,1:Tc-1 
Lk,1:k-1 



0\ /Ui. 



Ll:k-l,1:k-1 Hkk-I.kn 
U T 



u 



a - (b T C _1 D 



< I a| i + 



< 



a 



b T C _1 D 



-l 



+ 



< HAIL [ 1 + 



| D || oo by Propositions 12.41 and 12 

by Proposition 12.71 

(4.2) 



We now bound the probability 



'cVb 



is large. By Proposition 12.51 



< Vk- 1 



Note that b and C are independent of each other. Therefore, 



Pr 

b,C 



-1 



> X 



< Pr 

b,C 



b 



> X 



; /2 v / k^iy(k-1)cx 2 + 1 



2 k 



<\ , (4.3) 

V 7tXCT 

where the second inequality follows from Lemma 14.21 below and the last inequality follows from the 
assumption a 2 < 1 . 

We now apply a union bound over the choices of k to obtain 

r , n /Ik 1 n(n+l) 
Pr[p u (A) > 1 +x] <T\ < 

^ — V tt vrr 



k=2 



7T XCT V27T XCT 



□ 



Lemma 4.2. Let C be an arbitrary square matrix in M dxd , and C be a Gaussian perturbation of 
C of variance a 1 . Let b be a column vector in R d such that ||b|| 2 < 1, and let b be a Gaussian 
perturbation of b of variance a 2 . If b and C are independent of each other, then 



Pr 

b,C 



C ] b 



> x 



/2V^dTT 



7t XO" 
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Proof. Let 6 be the unit vector in the direction of b. By applying Lemma l3.2l we obtain for all b, 



Pr 




C ] b 


> X 


= Pr 




C ] b 


X 


c 






2 


c 






2 b 2 . 



2 1 

< \l b U. 

7tXG" 



Let (J-(b) denote the density according to which b is distributed. Then, we have 

u-(b)db 



Pr 




C^b 


> X 




Pr 






> X 


b,C 






2 




beM d c 






2 J 



< 



2 1 

7TXG" 



-— E b [||b| 
7txo" 



b|| 2 M-(b)db 



It is known |KJ82l p. 277] that E b 



< cr 2 d+ lib II 2 . As E [X] < ^E [X 2 ] for every positive 



random variable X, we have E b [||b|| 2 ] < J <? 2 d + 1 1 1» 1 1 ^ < v / cr 2 d+ 1. 



□ 



4.1.2 Second Bound for pu(A) 

In this section, we establish an upper bound on pu(A) which dominates the bound in Theorem 14. II 
for c > n~ 3 / 2 . 

If we specialize the parameters in this bound to A = and a = 1 , we improve the average-case 
bound proved by Yeung and Chan |YC97j (see Theorem II. 3|) by a factor of n. Moreover, the 
resulting bound agrees with their experimental results. 

Theorem 4.3 (Second bound on pu(A)). Let A be an n x n matrix satisfying ||A|| 2 < 1, and 
let A be a Gaussian perturbation of A of variance a 2 < 1 . For n > 2, 



Pr[p u (A) > 1 + x] < 



2 1 (1 



7TX V 3 



o /7 n 4 v/n 

n 3/2 + _ + V 

a 3 ct z 



Proof. As in the proof of Theorem 14.11 we will separately consider the kth row of U for each 
2 < k < n. For any such k, define u, a, b, C and D as in the proof of Theorem 14. II 
In the case when k = n, we may apply ()4.3|) in the proof of Theorem 14.11 to show 



Pr 



u 



> 1 +x 



< 



2 n 



7t XO" 



(4.4) 



We now turn to the case k < n — 1 . By (|4.1j) and Proposition I2.5| we have 



m\i < I gl| i + 



b T C _1 D 



< Hall, + Vk- 1 



b T C _1 D 



| cx]| + \/k- 1 



b T C 
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The last equation follows from Proposition 12.61 Therefore, for all k < u — 1 , 



u 



< 



| ol|| -, + v / k^T||b T Cr 1 D| 



< 1 + 



< 1 + 



y/k-1 


|b T C _1 D|| 2 




(A n , : ) T 


i 



(by Proposition 12, 7() . 
(also by Proposition 12. 7|) . 



We now observe that for fixed b and C, (b C )D is a Gaussian random row vector of variance 
1 1 t>~ r C 1 1| 2 a 2 centered at (b T C _1 )D, where D is the center of D. We have ||D|| 2 < ||A|| 2 < 1, by 
the assumptions of the theorem; so, 



b T C _1 D 


< 


b T C 1 


2 I|0|| 2 < 


b T C 1 




2 









Thus, if we let t T = (b T C D)/ ||b T C |L, then for every fixed b and C, t is a Gaussian random 
column vector in R n ~ k+1 of variance a 2 centered at a vector of 2-norm at most 1 . We also have 

(4.5) 



Pr 




b T C _1 D 


> X 


= Pr 




b T C 1 


lltlU > X 


b,C,D 






2 


b,C,t 






2 Z J 



It follows from Lemma 14.21 that 



Pr 




b T C 1 


> X 


< 


b,C 






2 





2 y/a 2 (k-1) + 1 
7t xa 



Hence, we may apply Corollary IC. 51 to show 



Pr 

b,C,t 



Tr»— 1 



b'C 



|t|| 2 > x 



< 



< 



2 y/G 2 (k-]) + Vo" 2 (n-k+1) + 1 



xa 



i + ^4- 



n xa 



(4.6) 



Note that A nj: is a Gaussian perturbation of variance a 2 of a row vector in R n . As A n): is 
independent of b, C and D, we can apply (|4.5|) . (|4.6|) and Lemma TC.4I to show 



Pr 



Vk- 


1 


|b T C 






(A n , : ) T 


i 



> X 



< 



■ J 2 - 

7T 



2 Vk^T(l+^ 



E 



xa 



1 



Vlc^T 1 + 



xa 



na 



by Lemma lA.4l 
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Applying a union bound over the choices for k, we obtain 



Pr [p u (A) > 1 + x] < 



< 




3/2+ n 

cr 3 cr 2 



where the second inequality follows from 



n-2 



Y_ vie < -n 



3/2 



k=l 



□ 



4.2 Growth in L 

Let L be the lower-triangular part of the LU-factorization of A. We have 

L (k+1 ):n,k - ^ (k+1 ):n>k / ^k.k ' 

where we let A' k ' denote the matrix remaining after the first k columns have been eliminated. So, 
A<°> = A 

Recall Pl(A) = HL^, which is equal to the maximum absolute row sum of L (Proposition 
12 .7|) . We will show that it is unlikely that ||L( k +i) : n,k|| 00 is large by proving that it is unlikely that 



(k-1) 

k -(k+1):n,k 



is large while 



L k,k 



is small. 



Theorem 4.4 (pl(A)). Let A be an n-by-n matrix for which ||A|| 2 < 1, and let A be a Gaussian 
perturbation of A of variance a 2 < 1 . If n > 2, then, 

Pr [p L (A) > x] < ( ^ + Vlh^ + -=!— ) 

V 7t x \ a V27rlnn / 
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Proof. For each k between 1 and n — 1 , we have 

A (lc-1) 

^(k+l):n,k 



l-(k+1):n,k - (Tc— 1 j 

^k.k 

A(k+1):n,k~ A( k+1 ) :n j.( k _i)A i ;^ k ^ ^.^ ^A^f^T)^ 

A( k+1 ). nk — A (k+1 ). n 1: ( k _-|)V 
A k>k - A k) -| : ( k _-i)V 

where we let v = A^^^^At.^)^. Since ||A|| 2 < 1 , and all the terms A (k+1):rL)k , A^).-^.^-,) 
A k)k , A k i-( k _T) and v are independent, we can apply Lemma I4.5I to show that 



Pr 




+ \/21n(max(n-k,2)) + 



1 



27tln(max(n — k,2)) 



< \ — — + v / 21nu + 



1 



2n Inn 



where the last inequality follows the facts that V / 2z+ J is an increasing function when z > n 1 ^ 3 , 



27TZ 

and In 2 > n^ /3 . 

The theorem now follows by applying a union bound over the n choices for k and observing 
that || L|| oo is at most n times the largest entry in L. □ 

Lemma 4.5 (Vector Ratio). Let d and n be positive integers. Let a, b, x, and Y be Gaussian 
perturbations of a £ R 1 , b £ R d , x e M n , and Y G M nxd , respectively, of variance cr 2 , such that 
|a| < 1, ||b|| 2 < 1, ||x|| 2 < 1, and ||Y|| 2 < 1. Let v be an arbitrary vector in M d . If a, b, x, and Y 
are independent and a 1 < 1 , then 



Pr 



|a + b T v| 



> x 



2 1 f VI 



< \ — — + ^lnmaxlu^) + 



1 



7tX 



27rln max(u,2) 



Proof. We begin by observing that a + b T v and each component of x + Yv is a Gaussian random 

7 1 1 1 1 2 

variable of variance cr (1 + ||v|| 2 ) whose mean has absolute value at most 1 + ||v|| 2 , and that all 
these variables are independent. By Lemma lA. 31 



B X)Y [||x + Yv||J < 1 + ||v|| 2 + 



1 + llvl 



]\ ^21nmax(n,2) + 



1 



27rln max(n, 2) 
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On the other hand, Lemma I A . 21 implies 

1 



Pr 

a,b 



| a + b T v| 

Thus, we can apply Corollary IC.4I to show 



> x 



7T 



(4.7) 



X(J\ 1 + llvl 



Pr 



|a + b T v| 



> x 



1 + l|v|U+ (Ta/1 + llv 



< \ - 



\ \ ^A/21nmax(n,2) + 



27Tlnmax(n,2) 



X0~\/ 1 + llvl 



2 1 

7t X 



1 + llvl 



+ 



C\ 1 + llvl 



7^ ^A/21nmax(n,2) + 



i 



27tln max(n,2) 



CT\/1 + ||v|| 



a-v/1 + llvl 



2 1 A/2 



1 



< \ — — + V21nmax(n,2) + -= 

V 7tx ^ a v V27tlnmax(n,2) 

where the last inequality follows from (1 + z) 1 < 2(1 + z 2 ), Vz > 0. 



□ 



5 Smoothed Analysis of Gaussian Elimination 

We now combine the results from the previous sections to bound the smoothed precision needed 
in the application of Gaussian elimination without pivoting to obtain solutions to linear systems 
accurate to b bits. 

Theorem 5.1 (Smoothed precision of Gaussian elimination). For n > e 4 , let A be an n- 
by-n matrix for which || A|L < 1 , and let A be a Gaussian perturbation of A of variance a 2 < 1/4. 
Then, the expected number of bits of precision necessary to solve Ax = b to b bits of accuracy 
using Gaussian elimination without pivoting is at most 

b + ^- log2 n + 3 log 2 f + log 2 ( 1 + 2Vn~cr) + j log 2 log 2 n + 6.83 

Proof. By Wilkinson's theorem, we need the machine precision, e mac h., to satisfy 

5-2 b up L (A)pu(A)K(A)e mQcH < 1 
2.33 + b + log 2 n + log 2 (p L (A)) + max(0,log 2 (p u (A))) + log 2 ( K (A)) < log 2 (1/e macH ). 

We will apply Lemma IC. 61 to bound these log terms. Theorem 14. II tells us that 

„ r , i 1 Tifn + 1 ) 

Pr[p u (A) > 1 +x] < 



27T Xff 
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To put this inequality into a form to which Lemma IC .61 mav be applied, we set 

/27t(T \ 



y =x M + 



to obtain 



Pr [p u (A) > y] < 

By Lemma IC 61 

E [max(0,log 2 p u (A))] < log 2 



n(n + 1) / ' 
1 n(n + r 



In o 
1 nfn+1 



+ 1 



+ 1 +log 2 e 



< log 2 (n(n + 1 ) + crv^) + log 2 \ + log 2 



< log 2 ( 1 .02n z ) + log 2 1-1 + log 2 

< 21og 2 u + log 2 ( - ) +0.15, 



where in the second-to-last inequality, we used the assumptions u > e 4 and a < 1/2. In the last 
inequality, we numerically computed log 2 (l .02e/\/2~7t) < 0.15. 
Theorem 14.41 and Lemma IC. 61 imply 



E [log 2 p L (A)] < log 2 ^y^ 2 + v / 21rm + 



1 



< 2 log 2 n + log 2 ( — + Vlnn ( 1 + 



27rlnn 
1 



+ log 2 e 
+ log 2 



2e 



2y/nlnn 

21og 2 n + log 2 ( - ] +log 2 Vlnn + log 2 ( + a ( 1 + — =! 

V '/ ~ VVlnu V 2 V n 



using cr < 4 and n > e 4 , 



1 \ 1 



< 2 log 2 n + log 2 I - j + - log 2 log 2 n + log 2 ( 1 + 

< 21og 2 n + log 2 (-) + ^log 2 log 2 u+ 1.67, 



+ log 2 



Inn 



2e 



+ log 2 



2e 



as log 2 (l +1 /16y / 7t)+log 2 (2e/v / 7t) < 1 .67. Theorem IH . 'A\ and Lemma fC.6| along with the observation 
that log 2 (2.35e) < 2.68, imply 



E 



log; 



A 



< - log, n + log 2 (- ) ! 2.68. 
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Finally, 

E[log 2 (||A|| 2 )] <log 2 (l +2VHa) 

follows from the well-known facts that the expectation of A — A|L is at most 2^/^\<J (c.f., |SegOO| ) 
and that E [log 2 (X)] < log 2 E [X] for every positive random variable X. Thus, the expected number 
of digits of precision needed is at most 

b + y log 2 n + 31og 2 +log 2 (1 + 2\/ria) + ^ log 2 log 2 n + 6.83. 

□ 

The following conjecture would further improve the coefficient of log(l /o~). 

Conjecture 2. Let A be a n-by-n matrix for which || A|| 2 < 1 , and let A be Guassian perturbation 
of A of variance a 2 < 1 . Then 

Pr[p L (A)p u (A)K(A) >x] < ■ ° 8 



xa 

for some constants ci and c 2 . 



6 Zero-preserving perturbations of symmetric matrices with di- 
agonals 

Many matrices that occur in practice are symmetric and sparse. Moreover, many matrix algorithms 
take advantage of this structure. Thus, it is natural to study the smoothed analysis of algorithms 
under perturbations that respect symmetry and non-zero structure. In this section, we study 
the condition numbers and growth factors of Gaussian elimination without pivoting of symmetric 
matrices under perturbations that only alter their diagonal and non-zero entries. 

Definition 6.1 (Zero-preserving perturbations). Let T be a matrix. We define the zero- 
preserving perturbation of J of variance a 1 to be the matrix T obtained by adding independent 
Gaussian random variables of mean and variance o 2 to the non-zero entries of T. 

Throughout this section, when we express a symmetric matrix A as T + D + T T , we mean that 
T is lower-triangular with zeros on the diagonal and D is a diagonal matrix. By making a zero- 
preserving perturbation to T, we preserve the symmetry of the matrix. The main results of this 
section are that the smoothed condition number and growth factors of symmetric matrices under 
zero-preserving perturbations to T and diagonal perturbations to D have distributions similar those 
proved in Sections |3] and 0] for dense matrices under dense perturbations. 
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6.1 Bounding the condition number 



We begin by recalling that the singular values and vectors of symmetric matrices are the eigenvalues 
and eigenvectors. 

Lemma 6.2. Let A = T + D + T T be an arbitrary n-by-n symmetric matrix. Let T be a zero- 
preserving perturbation of T of variance a 2 , let Gd be a diagonal matrix of independent Gaussian 
random variables of variance a 2 and mean that are independent of T, and let D = D + Gd- 
Then, for A = T + D + T T , 

^3/2 



Pr 




A" 1 


> X 


< 








2 





7t X(J 



Proof. By Proposition l2.il 

Pr I" (T + D + T 7 )- 1 

T,G D L 



> X 



< max Pr 
T G D 



((T + D + T T ) + G D 



> x 



The proof now follows from Lemma 16.31 taking T + D + T T as the base matrix. 



□ 



Lemma 6.3. Let A be an arbitrary n-by-u symmetric matrix, let Gd be a diagonal matrix of 
independent Gaussian random variables of variance o 2 and mean 0, and let A = A + Gd- Then, 



Pr 




A" 1 


> X 


< 








2 





2n 3 / 2 



7t XO" 



Proof. Let x-|, 



, x n be the diagonal entries of Gd, and let 

1 

n 



9 
Vi 



— y~ Xi, and 

n t— 

x t - g. 



Then, 



Pr 



(A+G D ) 



> x 



Pr 

yi , -,Un,g 

< max Pr 



(A + diag(u 1 ,...,y n ) + gI) 1 
A + diag(y 1) ...,-y TL ) + gI) _1 



> x 



> x 



where the last inequality follows from Proposition 12.11 The proof now follows from Proposition 16.41 
and Lemma 16.51 □ 

Proposition 6.4. Let Xi, . . . ,X n be independent Gaussian random variables of variance o 2 with 
means ai , . . . , a n , respectively. Let 

1 n 

G = — 5~ Xt, and 
n L — 

i=l 

Yi = Xi-G. 
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Then, G is a Gaussian random variable of variance a /n with mean (1/n) Y_ a i, independent of 
Yi , . . . , Y n . 

Lemma 6.5. Let A be an arbitrary n-by-n symmetric matrix, and let G be a Gaussian random 
variable of mean and variance a 2 /n. Let A = A + GI. Then, 

I n 3 / 2 

7T X(J 

Proof. Let A-j , . . . , A n be the eigenvalues of A. Then, 

-1 



Pr 




A" 1 


> X 


< 


A 






2 





:a+gi 



= min | At + G| . 

2 i 



Thus, 



Pr 




A" 1 


> X 


= Pr 


A 






2 


G 



min |Ai — Gj < - 
i x 



< 



Y Pr 
— a 



|Ai-G|<~ 
x 



< 



C — V 7t XO" 



71 XO" 



where the second-to-last inequality follows from Lemma lA.2l for R. 
As in Section we can now prove: 



□ 



Theorem 6.6 (Condition number of symmetric matrices). Let A = T + D + T T be an 
arbitrary u-by-u symmetric matrix satisfying ||A|| 2 < \/n. Let a 2 < 1 , let T be a zero-preserving 
perturbation of T of variance cr 2 , let Gd be a diagonal matrix of independent Gaussian random 
variables of variance a 2 and mean that are independent of T, and let D = D + Gd- Then, for 
A = T + D + T T , 

/T n 7/2 , N 

Pr [k(A) > x] < 6\ (1 + V21n(x)/9n 

V 7r xa V ' 



Proof. As in the proof of Theorem 13.11 we can apply the techniques used in the proof of [DS01 ( 
Theorem II. 7], to show 

Pr [|| A - 



A|L>2v^ + kl <e- k2/2 . 



The rest of the proof follows the outline of the proof of Theorem 13.11 using Lemma 16.21 instead of 
Theorem 13.31 □ 



6.2 Bounding entries in U 

In this section, we will prove: 
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Theorem 6.7 (pu(A) of symmetric matrices). Let A = T + D + T T be an arbitrary n-by-n 
symmetric matrix satisfying ||A|| 2 < 1. Let cr 2 < 1, let T be a zero-preserving perturbation of T 
of variance a 2 , let Go be a diagonal matrix of independent Gaussian random variables of variance 
a 2 and mean that are independent of T, and let D = D + Go- Then, for A = T + D + T T , 



Pr [p u (A) > 1 + x] < 



2 /2n 3 



7TXO" 



Proof. We proceed as in the proof of Theorem 14.11 For k between 2 and n, we define u, a, b and 
C as in the proof of Theorem 14.11 By (14.2(1 



u 



*-<1 + 



< 1 +\/k^T 



b T C 1 



< i + \/ic^n"iibi 



Hence 



Pr 



A 



> 1 +x 



< Pr 



<E[||b|Uh - 



> 



VTc~~~T 



2(k-1) 2 



< a/1 + ja 2 



7t XO" 

2(k-V 



7T XO" 

iVkik — 1 ) 2 



by Lemmas 16.21 and IC.41 

where ) is the number of non-zeros in b, 



< 

7T XO" 

Applying a union bound over k, 



2 1 



Pr[ Pu (A) >x] < J )"Vk(k-1) 2 < 

V 7tX0" Z — 



2 /In 7 / 2 



k=2 



7t XO" 



□ 



6.3 Bounding entries in L 

As in Section 14.21 we derive a bound on the growth factor of L. As before, we will show that it is 

fk— 1 ) fk— 1 ) 

unlikely that A- k is large while A kk is small. However, our techniques must differ from those 

used in Section I4~2*l as the proof in that section made critical use of the independence of A^w^ij 

and A 1:(k _ 1)ik . 

Theorem 6.8 (pl(A) of symmetric matrices). Let a 2 < 1 and n > 2. Let A = T+D+T T be an 
arbitrary n-by-n symmetric matrix satisfying ||A|L < 1. Let T be a zero-preserving perturbation 
of T of variance cr 2 , let Go be a diagonal matrix of independent Gaussian random variables of 
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variance cr 2 < 1 and mean that are independent of T, and let D = D + Gd- Let A = T + D + T T . 
Then, 



fl 1 r , . , , 3.2n 4 V2 

Vx > \ 2> Pr PL A > x] < r ln 3/2 

V 7t a 7 xcr 2 



c » / 2 X0 " 



Proof. Using Lemma 16.91 we obtain for all k 



Pr[3j >k: |Ljv| >x] < Pr 



(k+1):n,k|| 2 > X 



< 



3.2n 2 



xcr 



In 3 / 2 ( t 



£\ / rrxa 



Applying a union bound over the choices for k, we then have 



Pr0j,k:|Lj, k |>x]<— -^ln 3 / 2 



C 4 / -Xff' 



The result now follows from the fact that [|L|| is at most n times the largest entry in L. 
Lemma 6.9. Under the conditions of Theorem I6.8| 



Vx > 



Proof. We recall that 



2 1 



7T cr z 



Pr 



-(k+1]:n,k|| 2 > x 



< 



3.2n 2 



Xff z 



Alc + 1: ni Tc — Ak+l^-hk-lA-i^i -|. k _ 1 Al : k-l ) k 



In 3 / 2 (e^l xcr 



-k+1:n,k 



Ak,k — Ak,l:k-1 A-1:k_1 l:k-lAl:k-l,k 



□ 



Because of the symmetry of A, A^ -\±--\ is the same as Ai^-i^, so we can no longer use the 
proof technique that worked in Section 14.21 Instead, we will bound the tails of the numerator and 
denominator separately, exploiting the fact that only the denominator depends upon A^k- 

Consider the numerator first. Setting v = A^_i i.v_iAi : ic_i k, the numerator can be written 
Aic + i :n) i : ic ( ~j v ) . We will now prove that for all x > 1 /a, 



Pr 

A]c+1:n,l:k 
Al : k-1 ,1:1c 













Ak+1:n.,1:k ( -j ^ 


> X 






oo 





Xff 



Let 



1 + a^2h^{xo) , 



(6.1) 



(6.2) 



which implies = y/2 In (xcr). It suffices to prove ()6.1[) for all x for which the right-hand side is 
less than 1 . Given that x > 1 /cr, it suffices to consider x for which cx > 2 and xcr > 2. 
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We use the parameter c to divide the probability as follows: 



Pr 

Alc+l:n,1:k 
Al;k_1 ,1:1c 



Mc+l:n,1:k 



> X 



< 



Pr 

M:(lc-l},1:k 











(T) 


> cx 




2 



+ Pr 

Ak+l :n,1 :k 



Mc+l:n,l:k 



-V 



1 










< cx 


> - 


(7) 






(7) 


oo C 


2 




2 



(6.3) 
(6.4) 



To evaluate (|6.4jl . we note that once v is fixed, each component of Afc+i^ifc ( ^) is a Gaussian 
random variable of variance || (^ v ) H2 0-2 an d mean at most ||>^-ic+l :Tt,l:k. ("TO H2 — IKl^lL' 



*-k+1:n,l:k 



1 



1 

> - 
c 



1 



implies one of the Gaussian random variables differs from its mean by more than (1/c — 1 )/c times 
it standard deviation, and we can therefore apply Lemma I A . 1 1 and a union bound to derive 

/Xne - ^-^) [2 n 



To bound H6.3|) . we note that Lemma 16.21 and Corollary IG.5I imply 



Pr 

A 1:(k-1 ),l:k 



A 



1:k-1,1:k-l A 1*-bk 



>y 



and so 



Pr 











(7) 


> cx 




2 



< Pr 

A] : (k-l ) ,1 :k 



A7' 



Ttyo" 



1:k-1,1:k-|Al:k-1,k 



> CX - 1 



< 



n 



7t (cx — 1 la 



rr 



< 



7T (CX0"(1 — 1 /CX)) 

2 n 2 (l + cT^21n(xcr)) 

7T Xff (1 — 1 /CX) 

2 2n 2 (l + aV21n(xa)) 

7t XO" 



by cx > 2. 
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So, 



Pr 

Ak+l :n,l :k 
Al : k-1 ,1:1c 



Mc+l:n,1:k 



> X 



< 



n 2n 2 (l + 0-^2 In (xo-))' 



7t ln(xcr; 



+ 



XG" 



rj ( 2n 2 (J + (r^/21n(xCT)J + n 
— V 7r I xa 

by the assumption xcr > 2, which proves (|6.1[) . 

As for the denominator, we note that A^k is independent of all other terms, and hence 



(6.5) 



Pr 



Aic,k _ A^i^-iA-i.^j.^-jAijic-^ic 



< 1/x 



< 



2 1 



7rxa 



(6.6) 



by Lemma lA.21 Applying Corollary IC.3I with 

1. 



2n z + n 



4u 2 a 



7T 



to combine (|6.5|) with (|6.6|) . we derive the bound 

" ^n 2 + u + ( (2 + 4\/2ct/3) n 2 + n) ln 3/2 ( ^/^Jlxa 1 



nxa^ 



< 



2n 2 



7rxo- z 



3 + 4\/2cr/3 ) ( ln 3/2 ( ^Jlxa 2 ) + 1 



3.2n 



< — ^In 3/2 (ey7t72 



XCT Z 



Lxa 



as o" < 1 . 



□ 



7 Conclusions and open problems 

7.1 Generality of results 

In this paper, we have presented bounds on the smoothed values of the condition number and growth 
factors assuming the input matrix is subjected to a slight Gaussian perturbation. We would like 
to point out here that our results can be extended to some other families of perturbations. 

With the exception of the proof of Theorem l3.3l the only properties of Gaussian random vectors 
that we used in Sections 01 and 0] are 

1. there is a constant c for which the probability that a Gaussian random vector has distance 
less than e to a hyperplane is at most ce, and 
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2. it is exponentially unlikely that a Gaussian random vector lies far from its mean. 

Moreover, a result similar to Theorem Yd.'dl but with an extra factor of d could be proved using just 
fact 1. 

In fact, results of a character similar to ours would still hold if the second condition were 
reduced to a polynomial probability. Many other families of perturbations share these properties. 
For example, similar results would hold if we let A = A + U, where U is a matrix of variables 
independently uniformly chosen in [— 0", a], or if A = A + SS, where the columns of SS are chosen 
uniformly among those vectors of norm at most a. 

7.2 Counter-Examples 

The results of sections |3] and 0] do not extend to zero-preserving perturbations for non-symmetric 
matrices. For example, the following matrix remains ill-conditioned under zero-preserving pertur- 
bations. 

1 -2 
1-20 
1-2 
1-2 
1 

A symmetric matrix that remains ill-conditioned under zero-preserving perturbations that do not 
alter the diagonal can be obtained by locating the above matrix in the upper-right quadrant, and 
its transpose in the lower- left quadrant: 
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The following matrix maintains large growth factor under zero-preserving perturbations, re- 
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gardless of whether partial pivoting or no pivoting is used. 





















1.1 








-1 


1.1 





-1 


-1 


1.1 



1 -1 -1 



These examples can be easily normalized to so that their 2-norms are equal to 1. 
7.3 Open Problems 

Questions that naturally follow from this work are: 

• What is the probability that the perturbation of an arbitrary matrix has large growth factors 
under Gaussian elimination with partial pivoting? 

• What is the probability that the perturbation of an arbitrary matrix has large growth factors 
under Gaussian elimination with complete pivoting? 

• Can zero-preserving perturbations of symmetric matrices have large growth factors under 
partial pivoting or under complete pivoting? 

• Can zero-preserving perturbations of arbitrary matrices have large growth factors under com- 
plete pivoting? 

For the first question, we point out that experimental data of Trefethen and Bau |TB971 p. 168] 
suggest that the probability that the perturbation of an arbitrary matrix has large growth factor 
under partial pivoting may be exponentially smaller than without pivoting. This leads us to 
conjecture: 

Conjecture 3. Let A be an n-by-n matrix for which ||A|| 2 < 1, and let A be a Gaussian per- 
turbation of A of variance cr 2 < 1 . Let 11 be the upper-triangular matrix obtained from the 
LU-factorization of A with partial pivoting. There exist absolute constants ki , and a for which 

Pr [||U|| m «/||A|| m « > x + 1] < u k ' e- axk2 a 

Finally, we ask whether similar analyses can be performed for other algorithms of Numerical 
Analysis. One might start by extending Smale's program by analyzing the smoothed values of 
other condition numbers. 
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7.4 Recent Progress 

Since the announcement of our result, Wschebor |Wsc04j improved the smoothed bound on the 
condition number. 

Theorem 7.1 (Wschebor). Let A be an n x n matrix and let A be a Gaussian perturbation of 
A of variance o 2 < 1 . Then, 



TT 



1 / 4 ADl+logn)\ 



Pr [k(A) > x] < - - + 75 + 

x I 4V2nn \ c 2 n 

When ||A|| 2 < y/K, his result implies 

Pr[K(A)>x]<0' nl ° gn 



Xff 

We conjecture 

Conjecture 4. Let A be an n x n matrix satisfying ||A|| 2 < ^/n, and let A be a Gaussian 
perturbation of A of variance a 2 < 1 . Then, 

Pt[k(A) >x] < O (—) . 

8 Acknowledgments 

We thank Alan Edelman for suggesting the name "smoothed analysis" , for suggesting we examine 
growth factors, and for his continuing support of our efforts. We thank Juan Cuesta and Mario 
Wschebor for pointing out some mistakes in an early draft of this paper. We thank Felipe Cucker 
for bringing Wschebor's paper Wsc04 to our attention. Finally, we thank the referees for their 
extraordinary efforts and many helpful suggestions. 



References 

[ABB + 99] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, 
S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users' Guide, 
Third Edition. SIAM, Philadelphia, 1999. 

[AS64] Milton Abramowitz and Irene A. Stegun, editors. Handbook of Mathematical Functions 
with Formulas, Graphs, and Mathematical Tables, volume 55 of Applied mathematics 
series. U. S. Department of Commerce, Washington, DC, USA, 1964. Tenth printing, 
with corrections (December 1972). 



28 



[BD02] Avrim Blum and John Dunagan. Smoothed analysis of the perceptron algorithm for 
linear programming. In SODA '02, pages 905-914, 2002. 

[Blu89] Lenore Blum. Lectures on a theory of computation and complexity over the reals (or 
an arbitrary ring). In Erica Jen, editor, The Proceedings of the 1989 Complex Systems 
Summer School, Santa Fe, New Mexico, volume 2, pages 1-47, June 1989. 

[Dem88] James Demmel. The probability that a numerical analysis problem is difficult. Math. 
Comp., pages 499-480, 1988. 

[Dem97] James Demmel. Applied Numerical Linear Algebra. SIAM, 1997. 

[DS01] K. R. Davidson and S. J. Szarek. In W. B. Johnson and J. Lindenstrauss, editors, 
Handbook on the Geometry of Banach spaces, chapter Local operator theory, random 
matrices, and Banach spaces, pages 317-366. Elsevier Science, 2001. 

[DST02] John Dunagan, Daniel A. Spielman, and Shang-Hua Teng. Smoothed anal- 
ysis of renegar's condition number for linear programming. available at 
http : //math.mit . edu/~spielman/SmoothedAnalysis, 2002. 

[Ede88] Alan Edelman. Eigenvalues and condition numbers of random matrices. SIAM J. Matrix 
Anal. Appl, 9(4):543-560, 1988. 

[Ede92] Alan Edelman. Eigenvalue roulette and random test matrices. In Marc S. Moonen, 
Gene H. Golub, and Bart L. R. De Moor, editors, Linear Algebra for Large Scale and 
Real-Time Applications, NATO ASI Series, pages 365-368. 1992. 

[GL83] Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins Series 
in the Mathematical Sciences. The Johns Hopkins University Press and North Oxford 
Academic, Baltimore, MD, USA and Oxford, England, 1983. 

[GLS91] Martin Grotschel, Laszlo Lovasz, and Alexander Schrijver. Geometric Algorithms and 
Combinatorial Optimization. Springer- Verlag, 1991. 

[Hig90] Nick Higham. How accurate is gaussian elimination? In Numerical Analysis 1989, 
Proceedings of the 13th Dundee Conference, volume 228 of Pitman Research Notes in 
Mathematics, pages 137-154, 1990. 

[JKB95] N. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions, vol- 
ume 2. Wiley-Interscience, 1995. 

[KJ82] Samuel Kotz and Norman L. Johnson, editors. Encyclopedia of Statistical Sciences, 
volume 6. John Wiley & Sons, 1982. 



29 



[LT91] Michel Ledoux and Michel Talagrand. Probability in Banach Spaces. Springer- Verlag, 
1991. 

[Ren95] J. Renegar. Incorporating condition measures into the complexity theory of linear pro- 
gramming. SIAM J. Optim., 5(3):506-524, 1995. 

[SegOO] Yoav Seginer. The expected norm of random matrices. Combinatorics, Probability and 
Computing, 9:149-166, 2000. 

[Sma97] Steve Smale. Complexity theory and numerical analysis. Acta Numerica, pages 523-551, 
1997. 

[Smo] http : //math.mit . edu/~spielman/SmoothedAnalysis. 

[ST03] Daniel Spielman and Shang-Hua Teng. Smoothed analysis of termination of linear 
programming algorithms. Mathematical Programming, Series B, 97:375-404, 2003. 

[ST04] Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the 
simplex algorithm usually takes polynomial time. J. ACM, 51(3):385-463, 2004. 

[TB97] L. N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, Philadelphia, PA, 1997. 

[TS90] Lloyd N. Trefethen and Robert S. Schreiber. Average-case stability of Gaussian elimi- 
nation. SIAM Journal on Matrix Analysis and Applications, ll(3):335-360, 1990. 

[Wil61] J. H. Wilkinson. Error analysis of direct methods of matrix inversion. J. Assoc. Comput. 
Mach., 8:261-330, 1961. 

[Wsc04] M. Wschebor. Smoothed analysis of K(a). J. of Complexity, 20(1):97-107, February 
2004. 

[YC97] Man-Chung Yeung and Tony F. Chan. Probabilistic analysis of gaussian elimination 
without pivoting. SIAM J. Matrix Anal. AppL, 18(2):499-517, 1997. 

A Gaussian random variables 

Lemma A.l. Let X be a univariate Gaussian random variable with mean and standard deviation 
1 . Then for all k > 1 , 

1 e-^ 2 
Pr[X>k] < —=——. 

vvtt k 
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Proof. We have 



Pr [X > k] 



e 2 " 2 dx 



putting t = jX 2 , 



Ik? \/2t 



dt 



< 



dt 



1 e~V 



□ 



Lemma A. 2. Let x be a d-dimensional Gaussian random vector of variance a 2 , let t be a unit 
vector, and let A be a real. Then, 



Pr 



t T x — A 



< e 



7t a 



Lemma A. 3. Let gi, . . . , g n be Gaussian random variables of mean and variance 1. Then, 

1 



E 



max|gi 

t 



< v / 21n]rnaxCM)) + 



2n ln(max(n,2)) 



Proof. For any a > 1 . 
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max gj > t 


dt 
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•00 





< 



1 dt + 



t=o 



nPr[|gi| > t] dt 



< a + 



a + 



< a + 



n 



Vln t 

_i . 
e 2 



dt 



2n f°° e-J t2 71 2 

— d 



2tt 
2n 1 



2n a z 



e-^ 2 d 



2n 1 i n 2 

a + — e"2 a 



In a z 



(applying Lemma [A. Il l 
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Setting a = y/2 m(max(u,2)), which is greater than 1 for all n > 1 , we obtain the following upper 
bound on the expectation: 



V21n(max(n,2)) + 



2n 



1 



1 



27r21n(max(n,2)) max(n,2) 



< V21n(max(n,2)) + 



1 



27rln(max(u,2)) 



□ 

Lemma A. 4 (Expectation of reciprocal of the 1-norm of a Gaussian vector). Let a be 
an arbitrary column vector in M. n for n > 2. Let a be a Gaussian perturbation of d of variance <x . 
Then 



E 



1 



a 



< 



Proof. Let a = (ai, . . . , a n ). It is clear that the expectation of 1/ ||a||-| is maximized if d = 0, so 
we will make this assumption. Without loss of generality, we also assume a 2 = 1 . For general a, 
we can simply scale the bound by the factor 1 /a. 

Recall that the Laplace transform of a positive random variable X is defined by 



£[X](t) = E X 



-tx 



and the expectation of the reciprocal of a random variable is simply the integral of its Laplace 
transform. 

Let X be the absolute value of a standard normal random variable. The Laplace transform of X 
is given by 



£[X](t) 



; tx e 2* 2 dx 



o 



71 

n 

ei erfc 



e -i( x +t) 2 



dx 



o 

roo 



-lx 2 J 

e 2 dx 



t 



Taking second derivatives, and applying the inequality (c./. AS64| 26.2.13]) 



e 2' 



-lx 2 

e i 



1 



2^ x+l/x' 



we find that e^ t erfc y-^jj is convex. 
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We now set a constant c = 2.4 and set a to satisfy 

1 



yc7" _ c Uc/n) 

ot 



erfc 



Numerically, we find that ot « 1 .9857 < 2. 

1+2 



As erfc y~^J is convex, we have the upper bound 

e^ 1 erfc ( -= ) < 1 , for < t < Jcjn. 

\V2j ot 

For t > y/ c/n, we apply the upper bound 

which follows from Lemma I A. II 
We now have 



E 



" 1 " 






.IMIi. 




v 



e2 l erfcft/VIn dt 



r-i/ c/n 



< 



Oi 



dt + 



'c/n 



2 1 

7tt 



dt 



n+ 1 
2 



2 (2/c 



(n-1)/2 



< 



< 



n + 1 
2 



+ 



7t n — 1 

2 (2/c)' n - 1) / 2 



7t n — 1 

n-T 

for n > 2. To verify this last equality, one can multiply through by (n + 1 ) (u — 1 ) to obtain 



-(n+1)(2/c 

7t 



,(n-1)/2 



<4, 



which one can verify by taking the derivitive of the left-hand side to find the point where it is 
maximized, n= (2 + ln(5/6))/ ln(6/5). □ 



33 



B Random point on a sphere 

Lemma B.l. Let d > 2 and let (u-|, . . . ,Ud) be a unit vector chosen uniformly at random in M d . 
Then, for c < 1 , 

'"if 



Pr 



IutI > 



>Pr [|G| > Vc] , 



where G is a Gaussian random variable of variance 1 and mean 0. 

Proof. We may obtain a random unit vector by choosing d independent Gaussian random variables 
of variance 1 and mean 0, x-\ , . . . , Xd, and setting 



We have 



Pr 



u ' - H 



We now note that 



= Pr 
= Pr 
> Pr 



(d-l)xf 

X 2 + ---+ X d 



C 

> - 



> 



fd- lie 



d 



> c 



since c < 1 . 



def y/{d- ^)X^ 



is a random variable distributed according to the t-distribution with d — 1 degrees of freedom. The 
lemma now follows from the fact (c./. JKB95J Chapter 28, Section 2] or |AS641 26.7.5]) that, for 
c> 0, 

Pr [t d > Vc\ > Pr [G > y/c\ , 



and that the distributions of td and G are symmetric about the origin. 



□ 



C Combination Lemmas 

Lemma C.l. Let A and B be two positive random variables. Assume 

1. Pr [A > x] < f(x). 

2. Pr [B > x|A] < g(x). 
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where g is monotonically decreasing and lim^oo g(x) = 0. Then, 



Pr [AB > x] < 



f 

o U 



-g'(t))dt 



Proof. Let U-a denote the probability measure associated with A. We have 



integrating by parts, 



setting t = x/s 



Pr [AB > x] 



< 



Pr [B > x/s|A] du A (s) 

B ■ 
roo , x > 

V s/ 



< 



'°° d /x^ 

Pr[A > s] — g - ) ds 
o ds V s - 

ro °f( S )A g p) ds> 

o ds V s / 



f(j)(-g'(t))dt. 

vt ' 



Corollary C.2 (linear-linear). Let A and B be two positive random variables. Assume 

1. Pr [A > x] < f and 

2. Pr [B > x[A] < | 
for some a, (3 > 0. Then, 

Pr [AB > x] < — f 1 + max (0, In [ ^~ 

x V V V a P 

Proof. As the probability of an event can be at most 1 , 

Pr [A > x] < min l) d = f(x), and 
Pr[B>x]<min ^,1^ = f g(x). 

Applying Lemma l( ] . II while observing 
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• g'(t) =0 forte [0,(3], and 

• f ( x /t) = 1 for t > x/ot, 
we obtain 



Pr [AB > x] < 



at 



■x/a 



• dt + max 0, 



at (3 
Tt 1 



dt + 



a|3 



px/a 



max 0, — 
\ x 



dt 



a|3 



1 + max ( 0,ln 



x 

o|3 



where the max appears in case x/a < [3. 

Corollary C.3. Let A and B be two positive random variables. If 

1. Vx > l/o, Pr [A > x] < min (l , and 

2. Pr [B > x|A] < £ 

for some a > 1 and (3, y, cr > 0, then, 



Vx > y/o 2 , Pr [AB > x] < ^% ( 1 + ( ^ + A ln 3/2 

xa z \ \3a J 



Proof. Define f and g by 



f(x; 



gw 



def 



def 



for x < SL 
— a 

for x > -| 



1 for x < ^ 
— c 



Applying Lemma l(J. II while observing 

• g'(t) =0 for t G [0,£], and 

• f(x/t) = 1 for t > xa/a, 
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we obtain 



Pr [AB > x] < 



,xtT/a a + p^/lnfxff/t) y 



xa/t t 2 a 



dt + 



xcr/a 
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at 2 



dt 



' xo ' a a+ (30n(xcr/t)y ^ + ay 



y/a 



XCT Z 



XCT Z 



(substituting s = ^\n{x(r/t), t = xae sl , which is defined as x > y/ a 2 , ) 

a+(3s y _ S 2. ay 
\ — T xa(~2se s ) ds H ^ 



7 
xcr 2 



ay 



p- v /ln(xa 2 /Y) 

2s(a + (3s) ds + 

2\ 10 / / v „2^ 



= % ( 1 + In f ^ + |P f ln 3/2 (*oA _ ln 3/2 N 

xa z \ \ ay / 3a \ \ y / / 



as a > 1 . 



□ 



Lemma C.4 (linear-bounded expectation). Let A, B and C be positive random variables such 
that 



a 



for some a > 0, and 
Then, 



Pr [A > x] < -, 



VA, Pr [B > x|A] < Pr [C > x] . 



Pr [AB > x] < - E [C] 

x 



Proof. Let g(x) be the distribution function of C. By Lemma IC. 11 we have 



Pr [AB > x] < 



'°° fat 
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Oi 
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•oo 

t(g'(t)) dt 
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E[C]. 



l-g)'(t))dt 
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Corollary C.5 (linear-chi). Let A a be positive random variable such that 

Pr [A > x] < -. 

x 

for some a > 0. Let b be a d-dimensional Gaussian random vector (possibly depending upon A) 
of variance at most a 2 centered at a vector of norm at most t, and let B = ||b|| 2 . Then, 

aVg 2 d + 1 2 
Pr [AB > xj < 

x 

Proof. As E [B] < y / E~[B 2 I, and it is known |KJ82| p. 277] that the expected value of B 2 — the non- 

7 II _ l|2 7 II - 1|2 

central x -distribution with non-centrality parameter ||b|L — is 0" d + ||b|L, the corollary follows 
from Lemma IC.4I □ 

Lemma C.6 (Linear to log). Let A be a a positive random variable. If there exists an Ao > 1 
and an a > 1 such that for all x > Aq, 



Pr[A > x] < -. 



Then, 

Proof. 



E A [max(0, In A)] < In max(A , a) + 1 . 

■00 

E A [max(0,lnA)] = Pr [max(0, In A) > x] dx 
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x=ln max ( Ao , ot 
poo 

>c=ln max{Ao ,tx 



Pr [In A > x] dx 
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ae x dx 



□ 
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